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Abstract 

Genes with similar transcriptional activation kinetics can display very 
different temporal mRNA profiles due to differences in transcription time, 
degradation rate and RNA processing kinetics. Recent studies have shown 
that a splicing-associated RNA production delay can be significant. To 
investigate this issue more generally it is useful to develop methods ap¬ 
plicable to genome-wide data sets. We introduce a joint model of tran¬ 
scriptional activation and mRNA accumulation which can be used for in¬ 
ference of transcription rate, RNA production delay and degradation rate 
given data from high-throughput sequencing time course experiments. We 
combine a mechanistic differential equation model with a non-parametric 
statistical modelling approach allowing us to capture a broad range of 
activation kinetics, and use Bayesian parameter estimation to quantify 
the uncertainty in the estimates of the kinetic parameters. We apply the 
model to data from estrogen receptor (ER-a) activation in the MCF-7 
breast cancer cell line. We use RNA polymerase II (pol-II) ClilP-Seq 
time course data to characterise transcriptional activation and niRNA- 
Seq time course data to quantify mature transcripts. We find that 11% 
of genes with a good signal in the data display a delay of more than 
20 minutes between completing transcription and mature mRNA produc¬ 
tion. The genes displaying these long delays are significantly more likely 
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to be short. We also find a statistical association between high delay and 
late intron retention in pre-mRNA data, indicating significant splicing- 
associated production delays in many genes. 


1 Introduction 


Induction of transcription through extracellular signalling can yield rapid changes 
in gene expression for many genes. Establishing the timing of events during this 
process is important for understanding the rate-limiting mechanisms regulating 
the response and vital for inferring causality of regulatory events. Several pro¬ 
cesses influence the patterns of mRNA abundance observed in the cell, including 
the kinetics of transcriptional initiation, elongation, splicing and mRNA degra¬ 
dation. It was recently demonstrated that significant delays due to the kinetics 
of splicing can be an important factor in a focussed study of genes induced by 
Tumor Necrosis Factor (TNF-ct) [I]. Delayed transcription can play an impor¬ 
tant functional role in the cell; for example, inducing oscillations within negative 
feedback loops [2] or facilitating ”just-in-time” transcriptional programmes with 
optimal efficiency ;3j. It is therefore important to identify such delays and to 
better understand how they are regulated. In this contribution we combine RNA 
polymerase (pol-II) ChIP-Seq data with RNA-Seq data to study transcription 
kinetics of estrogen receptor signalling in breast cancer cells. Using an unbiased 
genome-wide modelling approach we find evidence for large delays in mRNA 
production in 11% of the genes with a quantifiable signal in our data. A statis¬ 
tical analysis of genes exhibiting large delays indicates that splicing kinetics is 
a significant factor and can be the rate-limiting step for gene induction. 

A high-throughput sequencing approach is attractive as it gives broad cover¬ 
age and thus allows us to uncover the typical properties of the system. However, 
high-throughput data are associated with significant sources of noise and the 
temporal resolution of our data is necessarily reduced compared to previous 
studies using more focussed PCR-based assays [l,[4]. We have therefore devel¬ 
oped a statistically efficient model-based approach for estimating the kinetic 
parameters of interest. We use Bayesian estimation to provide a principled as¬ 
sessment of the uncertainty in our inferred model parameters. Our model can be 
applied to all genes with sufficiently strong signal in both the mRNA and pol-II 
data with only mild restrictions on the shape of the transcriptional activation 
profile (1814 genes here). 

A number of other works studying transcription and splicing dynamics (e.g. [I] 
forgo detailed dynamical modelling, which limits their ability to properly 
account for varying mRNA half-lives. Our statistical model incorporates a lin¬ 
ear ordinary differential equation of transcription dynamics, including mRNA 
degradation. Similar linear differential equation models have been proposed as 
models of mRNA dynamics previously (4][%. 8], but assuming a specific paramet¬ 
ric form for the transcriptional activity. In contrast, we apply a non-parametric 
Gaussian process framework that can accommodate a quite general shape of 
transcriptional activity. As demonstrated previously |9 - 11 , the linearity of the 


2 




N/ 


Pol-ll V 

_ ChIP-seq -7 


-> 



0 20 40 


160 


f (min) 


320 


Figure 1: A cartoon illustrating the underlying biology and data gathering at a 
single time point (left) and time series modelling (right). The data come from 
pol-II ChIP-seq, summarised over the last 20% of the gene body, and RNA-seq 
computationally split to pre-mRNA and different mRNA transcript expression 
levels. The modelling on the right shows the effect of changing mRNA half-life 
(ti/ 2 ) or RNA production delay (A) on the model response: both induce a delay 
on the mRNA peak relative to the pol-II peak, but the profiles have otherwise 
distinct shapes. 


differential equation allows efficient exact Bayesian inference of the transcrip¬ 
tional activity function. Before presenting our results we outline our modelling 
approach. 

2 Model-based inference of transcriptional de¬ 
lays 

Our modelling approach is summarised in Fig. [T] We model the dynamics of 
transcription using a linear differential equation, 

= Ppi* - A) - am(t) , (1) 

where m(t) is the mature mRNA abundance and p(t) is the transcription rate 
at the 3’ end of the gene at time t which is scaled by a parameter /I since we do 
not know the scale of our p(t) estimates. The parameter A captures the delay 
between transcription completion and mature mRNA production. We refer to 
this as the RNA production delay, defined as the time required for the poly¬ 
merase to disengage from the pre-mRNA and be fully processed into a mature 
transcript. The parameter a is the mRNA degradation rate which determines 
the mRNA half-life (ti /2 = In 2/a). We infer all model parameters (a, /3, A, 
the noise variance and parameters of the Gaussian process covariance function 
discussed below) using a Markov chain Monte Carlo (MCMC) procedure. The 
posterior distribution of the model parameters quantifies our uncertainty and 
we use percentiles of the posterior distribution when reporting credible regions 
around the mean or median values. 
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Figure 2: Left: Heat map of inferred pol-II and mRNA activity profiles after 
MCF7 cells are stimulated with estradiol. Genes with sufficient signal for mod¬ 
elling are sorted by the time of peak pol-II activity in the fitted model. Right: 
Examples of fitted model for six genes. For each gene, we show the fit using 
the pol-II ChIP-Seq data (collected from the final 20% of the transcribed re¬ 
gion) representing the transcriptional activity p(t) (see Eqn. 0). and using the 
RNA-seq data to represent gene expression m{t). Solid red/green lines show 
the mean model estimates for the pol-II/mRNA profiles respectively with asso¬ 
ciated credible regions. In each case we show the posterior distribution for the 
inferred delay parameter A to the right of the temporal profiles. Note that the 
final measurement times are very far apart (the a:-axis is compressed to aid vi¬ 
sualisation) leading to high uncertainty in the model fit at late times. However, 
this does not significantly affect the inference of delays for early induced genes. 


We measure the transcriptional activity p(t) using RNA polymerase (pol-II) 
ChIP-Seq time course data collected close to the 3’ end of the gene (reads lying 
in the last 20% of the transcribed region). Our main assumption is that pol-II 
abundance at the 3’ end of the gene is proportional to the production rate of 
mature mRNA after a possible delay A due to disengaging from the polymerase 
and processing. The mRNA abundance is measured using RNA-Seq reads map¬ 
ping to annotated transcripts, taking all annotated transcripts into account and 
resolving mapping ambiguities using a probabilistic method [12] (see Methods 
Section for details). As we limit our analysis to pol-II data collected from the 
3’-end of the transcribed region, we do not expect a significant contribution to A 
from transcriptional delays when fitting the model. Such transcriptional delays 
have recently been studied by modelling transcript elongation dynamics using 


pol-II ChIP-Seq time course data 13 and nascent mRNA (GRO-Seq) data 14 


in the same system. Here we instead focus on production delays that can occur 
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after elongation is essentially complete. 

Existing approaches to fitting models of this type have assumed a parametric 
form for the activation function p(t) [I|7|[8]. We avoid restricting the function 
shape by using a non-parametric Bayesian procedure for fitting p(t). We model 
p(t) as a function drawn from a Gaussian process which is a distribution over 
functions. The general properties of functions drawn from a Gaussian process 
prior are determined by a covariance function which can be used to specify 
features such as smoothness and stationarity. We choose a covariance function 
that ensures p(t) is a smooth function of time since our data are averaged across 
a cell population. Our choice of covariance function is non-stationary and has 
the property that the function has some persistence and therefore tends to stay 
at the same level between observations (see Supplementary Material for further 
details). The advantage of using a non-parametric approach is that we only 
have to estimate a small number of parameters defining the covariance function 
(two in this case, defining the amplitude and time-scale of the function). If we 
were to represent p(t) as a parametrised function we would have to estimate a 
larger number of parameters to describe the function with sufficient flexibility. 
The Bayesian inference procedure we use to associate each estimated parameter 
with a credible region would be more challenging with the inclusion of these 
additional parameters. 

We have previously shown how to perform inference over differential equa¬ 
tions driven by functions modelled using Gaussian processes [&-11 . The main 
methodological novelty in the current work is the inclusion of the delay term in 
equation (JT|) and the development of a Bayesian inference scheme for this and 
other model parameters. In brief, we cast the problem as Bayesian inference 
with a Gaussian process prior distribution over p(t) that can be integrated out 
to obtain the data likelihood under the model in Eqn. ([I]) assuming Gaussian ob¬ 
servation noise. This likelihood function and its gradient are used for inference 
with a Hamiltonian MCMC algorithm 15] to obtain a posterior distribution 
over all model parameters and the full pol-II and mRNA functions p(t) and 
m(t). 


3 Results 

We model the transcriptional response of MCF-7 breast cancer cells after stimu¬ 
lation by estradiol to activate estrogen receptor (ER-a) signalling. Fig.[2]shows 
the inferred pol-II and rnRNA profiles for all genes with sufficient signal for 
modelling, along with some specific examples of fitted models and estimated 
delay parameters. Before discussing these results further below, we describe the 
application of our method to realistic simulated data to assess the reliability of 
our approach for parameter estimation under a range of conditions. 
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Figure 3: Boxplots of parameter posterior distributions illustrating parameter 
estimation performance on synthetic data for the delay parameter A. The strong 
black lines indicate the ground truth used in data generation. The box extends 
from 25th to 75th percentile of the posterior distribution while the whiskers 
extend from 9th to 91st percentile. The results show that delay estimates are 
accurate and reliable, with the true value always in the high posterior density 
region. 


3.1 Simulated data 


We applied our method to data simulated from the model in Eqn. (JT|) using a p(t) 
profile inferred using pol-II data from the TIPARP gene (gene c in Fig. [2j see 
Supplementary Material for further details over the simulated data). We simu¬ 
lated data using different values of a and A to test whether we can accurately 
infer the delay parameter A. Fig. [3]shows the credible regions of A for different 
ground truth levels (horizontal lines) and for different mRNA degradation rates 
(half-lives given on the ir-axis). The results show that A can be confidently 
inferred with the ground truth always lying within the central part of the cred¬ 
ible region. The maximum error in posterior median estimates is less than 10 
min and when positive, the true value is always above the 25th percentile of 
the posterior. We observed that as the mRNA half-life increases, our confidence 
in the delay estimates is reduced. This is because the mRNA integrates the 
transcriptional activity over time proportional to the half-life leading to a more 
challenging inference problem. We also note that inference of the degradation 
parameter a is typically more difficult than inference of the delay parameter 


A (see Fig. SI). However, a large uncertainty in the inferred degradation rate 
does not appear to adversely affect the inference of the delay parameters which 
are the main focus here. More time-points, or a different spacing of time points, 
would be needed to accurately infer the degradation rates. Additional results 
of delay estimation in a scenario where the simulated half-life changes during 
the time course are presented in Fig. S2 These results demonstrate that the 


obtained delay estimates are reliable even in this scenario. 
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3.2 Estrogen receptor signalling 

We applied our method to RNA-Seq and pol-II ChIP-Seq measurements from 
MCF-7 cells stimulated with estradiol to activate ER-a signalling (see Methods 
section). The measurements were taken from cells extracted from the same pop¬ 
ulation to ensure that time points are directly comparable across technologies. 
Example fits of our model are shown in Fig. [2] The examples show a number 
of different types of behaviour ranging from early induced (a-c) to late induced 
(d-f), and from very short delay (a, d, e) to longer delays (b, c, f). Example 
(e), ECE1, is illuminating because visual inspection of the profiles suggests a 
possible delay, but a more likely explanation according to our model is a longer 
mRNA half-life and the posterior probability of a long delay is quite low. In¬ 
deed, it is well known that differences in stability can lead to delayed mRNA 
expression 16 and therefore delays in mRNA expression peak relative to pol-II 
peak time are not sufficient to indicate a production delay. Changes in splic¬ 
ing can be another potential confounder, but our transcript-based analysis of 
RNA-seq data can account for that. An example of how more naive RNA-seq 
analysis could fail here is presented in Fig. [S3] 

The parameter estimates of the models reveal a sizeable set of genes with 
strong evidence of long delays between the end of transcription and production 
of mature mRNA. We were able to obtain good model fits for 1864 genes. We 
excluded 50 genes with posterior median delay ^120 min, as these are unreliable 
due to sparse sampling late in the time course, which is apparent from broad 
delay posterior distributions. Out of the remaining 1814 genes with reliable 
estimates, 204 (11%) had a posterior median delay larger than 20 min between 
pol-II activity and mRNA production while 98 genes had the 25th percentile of 
delay posterior larger than 20 min, indicating confident high delay estimates. 
A histogram of median delays is shown in Fig. [4] (left). The 120 min long delay 
cut-off was selected by visual observation of model fits which were generally 
reasonable for shorter delays. Note that late time points in our data set are 
highly separated due to the exponential time spacing used and thus the model 
displays high levels of uncertainty between these points (see Fig. [2]). Therefore 
genes displaying confident delay estimates are typically early-induced such that 
time points are sufficiently close for a confident inference of delay time. Our 
Bayesian framework makes it straightforward to establish the confidence of our 
parameter estimates. 


3.3 Genomic features associated with long-delay genes 


Motivated by previous studies 5,6 17 we investigated statistical association 


between the observed RNA production delay and genomic features related to 
splicing. We found that genes with a short pre-mRNA (Fig. [5] left panel) are 
more likely to have long delays. We also find that genes where the ratio of the 
last intron’s length in the longest annotated transcript over the total length of 
the transcript is large (Fig. [5j right panel) are also more likely to have long 
delays, but this effect appears to be weaker. These two genomic features, short 


7 







Delay (min) RNA production delay A (min) 


Figure 4: Left: A histogram of delay posterior medians from 1864 genes found 
to fit the model well. Estimated delays larger than 120 min are considered 
unreliable and are grouped together. These 50 genes were excluded from fur¬ 
ther analysis, leaving 1814 genes for the main analysis. Right: Estimated gene 
transcriptional delay for the longest transcript plotted against the estimated 
posterior median RNA production delay. The transcriptional delay is estimated 
assuming each gene follows the median transcriptional velocity measured in 
Ref. 


14 


. The solid line corresponds to equal delays. 


pre-mRNA and relatively long last introns, are positively correlated, making it 
more difficult to separate their effects. To do so, Fig. |S6] shows versions of the 
right panel of Fig.[5]but only including genes with pre-mRNAs longer than 10 kb 
or 30 kb. The number of genes with long last introns in these sets is smaller and 
the resulting p -values are thus less extreme, but the general shape of the curves 
is the same. We did not find a significant relationship with the absolute length 
of the last intron. This may be because the two observed effects would tend to 
cancel out in such cases. We also checked if exon skipping is associated with long 
delays as previously reported (6 . The corresponding results in Fig. S7 show no 
significant difference in estimated delays in genes with and without annotated 
exon skipping. 


3.4 Analysis of the intronic read and pol-II distribution 

We investigated whether there was evidence of differences in the pattern of 
splicing completion for long-delay genes. To quantify this effect, we developed a 
pre-mRNA end accumulation index: the ratio of intronic reads in the last 50% 
of the pre-mRNA to the intronic reads in the first 50% at late (80-320 min) 
and early (10-40 min) times. Fig. [6] shows that genes with a long estimated 
delay display an increase in late intron retention at the later times. There 
is a statistically significant difference in the medians of index values for short 
and long delay genes (p < 0.01, Wilcoxon’s rank-sum test p-values for different 
short/long delay splits are shown in Fig. [6]). The example on the left of Fig. [6j 
DLX3, is a relatively short gene of about 5 kb and thus differences over time 
cannot be explained by the time required for transcription to complete. The 
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Figure 5: Tail probabilities for delays. Left: genes whose longest pre-mRNA 
transcript is short (m is the length from transcription start to end). Right: genes 
with relatively long last introns (/ is the ratio of the length of the last intron 
of the longest annotated transcript of the gene divided by the length of that 
transcript pre-mRNA). The fraction of genes with long delays A is shown by 
the red and blue lines (left-hand vertical axis). In both subplots, the black curve 
denotes the p -values of Fisher’s exact test for equality of fractions depicted by 
the red and blue curves conducted separately at each point (right-hand vertical 
axis) with the dashed line denoting p < 0.05 significance threshold. Similar 
plots for other values of m and / as well as different gene filter setups are given 


in Figs. S4 - S5 


corresponding analysis for pol-II ChIP-seq reads as well as GRO-seq reads is in 
Fig. S8 It shows a clear delay-associated accumulation to the last 5% nearest 


to the 3’ end, while for pol-II in the last 50% the accumulation is universal. 
These results suggest our short delay genes tend to be efficiently spliced while 
long delay genes are more likely to exhibit delayed splicing towards the 3’ end. 
There is also evidence of some accumulation of pol-II near the 3’ end although 
the effect appears relatively weak. We note that Grosso et al. 18 identified 


genes with elevated pol-II at the 3’-end which were found to be predominantly 
short, consistent with our set of delayed genes, and with nucleosome occupancy 
consistent with pausing at the 3’ end. 


3.5 Relative importance of production and elongation de¬ 
lays 

To better understand what are the rate-limiting steps in transcription dynamics, 
we assessed the relative importance of the observed RNA production delays in 
comparison to transcriptional delays due to elongation time. We estimated elon¬ 
gation times for each gene using assumed transcriptional velocity corresponding 


to the 2.1 kb/min median estimate from 14 combined with the length of the 
longest annotated pre-mRNA transcript. Others (e.g. 13 ) have reported higher 
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Figure 6: Left: We show the density of RNA-Seq reads uniquely mapping to 
the introns in the DLX3 gene, summarised in 200 bp bins. The gene region 
is defined from first annotated transcription start until the end of last intronic 
read. The ratio of the number of intronic reads after and before the midpoint 
of the gene region is used to quantify the 3’ retention of introns. The pre- 
mRNA end accumulation index is the difference between averages of this ratio 
computed over late times (80-320 min) and early times (10-40 min). Right: 
Differences in the mean pre-mRNA accumulation index (left-hand vertical axis) 
in long delay genes (blue) and short delay genes (red) as a function of the cut-off 
used to distinguish the two groups (horizontal axis). Positive values indicate 
an increase in 3’ intron reads over time. The black line shows the p-values of 
Wilcoxon’s rank sum test between the two groups at each cut-off (right-hand 
vertical axis). 


velocities, so this approach should provide reasonable upper bounds on actual 
elongation time for most genes. A comparison of these delays with our posterior 
median delay estimates is shown in Fig. [4] (right). The figure shows the major¬ 
ity of genes with short production delays and moderate elongation time in the 
top-left corner of the figure, but 14.3% (260/1814) of genes have a longer RNA 
production delay than elongation time. 


4 Discussion 

Through model-based coupled analysis of pol-II and mRNA time course data 
we uncovered the processes shaping mRNA expression changes in response to 
estrogen receptor signalling. We find that a large number of genes exhibit 
significant production delays. We also find that delays are associated with short 
overall gene length, relatively long final intron length and increasing late-intron 
retention over time. Our results support a major role for splicing-associated 
delays in shaping the timing of gene expression in this system. Our study 
complements the discovery of similarly large splicing-associated delays in a more 
focussed study of TNF-induced expression 1| indicating that splicing delays are 
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likely to be important determinants of expression dynamics across a range of 
signalling pathways. 

It is known that splicing can strongly influence the kinetics of transcrip¬ 
tion. Khodor et al. carried out a comparative study of splicing efficiency in fly 
and mouse and found a positive correlation between absolute gene length and 
splicing efficiency [5]. This suggests that efficient co-transcriptional splicing is 
facilitated by increased gene length and is consistent with our observation that 
delays are more common in shorter genes. In these genes it appears that the ma¬ 
ture mRNA cannot be produced after transcription until splicing is completed; 
it is splicing rather than transcription that is the rate-limiting step for these 
genes. In the same study it was also observed that introns close to the 3’-end of 
a gene are less efficiently spliced which is consistent with our observation that 
the relative length of the final intron may impact on splicing delays. A fur¬ 
ther theoretical model supporting a link between long final introns and splicing 
inefficiency was recently suggested in Ref. 19 , but it is unclear if it can fully 
explain the observed relationships. 

Our model assumes a constant mRNA degradation rate which may be un¬ 
realistic. Given the difficulty of estimating even a single constant degradation 
rate for simulated data where the true rate is constant, it seems infeasible to 
infer time-varying rates with the current data. On the other hand, estimated 
delays were quite reliably inferred even when we simulated data with a time- 
varying degradation rate (Fig. S2), and hence the potentially incorrect degra¬ 
dation model should not affect the main results significantly. 

It is important to differentiate the delays found here with transcriptional de¬ 
lays required for pol-II elongation to complete. Elongation time can be a signifi¬ 
cant factor in determining the timing of gene induction and elongation dynamics 
has been modelled using both pol-II ChIP-Seq 13 and nascent RNA (GRO- 
Seq) [14] time course measurements in the system considered here. However, in 
this study we limited our attention to pol-II data at the 3’-end of the gene, i.e. 
measuring polymerase density changes in the region where elongation is almost 
completed. Therefore, we will not see transcription delays in our data and the 
splicing-associated delays discussed above are not related to elongation time. 
Indeed, the splicing-associated delays observed here are more likely to affect 
shorter genes where transcription completes rapidly. These splicing-associated 
delays are much harder to predict from genomic features than transcriptional 
delays, which are mainly determined by gene length, although we have shown 
an association with final intron length and gene length. In the future it would 
be informative to model data from other systems to establish associations with 
system-specific variables (e.g. alternative splice-site usage) and thereby uncover 
context-specific mechanisms regulating the delays that we have observed here. 


4.1 Availability 

Raw data are available at GEO (accession GSE62789). A browser of all model 

fits and delay estimates is available at http: //ahonkela.users . cs .helsinki . f i/pol2rna/. 

Code for reproducing the experiments is available at 
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https://github.com/ahonkela/po!2rna. 


5 Methods 


5.1 Data acquisition and mapping 

MCF-7 breast cancer cells were stimulated with estradiol (E2) after being placed 
in estradiol free media for three days, similarly as previously described 13 . We 


measured pol-II occupancy and mRNA concentration from the same cell pop¬ 
ulation collected at 10 time points on a logarithmic scale: 0, 5, 10, 20, 40, 80, 
160, 320, 640 and 1280 min after E2 stimulation. At each time point, the pol-II 
occupancy was measured genome-wide by ChIP-seq and mRNA concentration 
using RNA-Seq. Raw reads from the ChIP-Seq data were mapped onto the 
human genome reference sequence (NCBI_build37) using the Genomatix Min¬ 
ing Station (software version 3.5.2; further details in Supplementary Material). 
On average 84.0% of the ChIP-Seq reads were mapped uniquely to the genome. 
The RNA-seq reads were mapped using bowtie to a transcriptome constructed 
from Ensembl version 68 annotation allowing at most 3 mismatches and ignor¬ 
ing reads with more than 100 alignments. The transcriptome was formed by 
combining the cDNA and ncRNA transcriptomes with pre-mRNA sequences 
containing the full genomic sequence from the beginning of the first annotated 
exon to the end of the last annotated exon. On average 84.7% of the RNA-seq 
reads were mapped. 


5.2 RNA-seq data processing 


mRNA concentration was estimated from RNA-seq read data using BitSeq 12 


BitSeq is a probabilistic method to infer transcript expression from RNA-seq 
data after mapping to an annotated transcriptome. We estimated expression 
levels to all entries in the transcriptome, including the pre-mRNA transcripts, 
and used the sum of the mRNA transcript expressions in FPKM units to esti¬ 
mate the mRNA expression level of a gene. Different time points of the RNA-seq 
time series were normalised using the method of 


20 


5.3 Pol-II ChIP-seq data processing 

The ChIP-seq data were processed into time series summarising the pol-II oc¬ 
cupancy at each time point for each human gene. We considered the last 20% 
of the gene body nearest to the 3’-encl. The gene body was defined from the 
start of the first exon to the end of the last exon in Ensembl 68 annotation. 
The data were subject to background removal and normalisation of time points. 
(Full details in the Supplementary Material.) 
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5.4 Filtering of active genes 

We removed genes with no clear time-dependent activity by fitting time-dependent 
Gaussian process models to the activity curves and only keeping genes with 
Bayes factor at least 3 in favour of the time-dependent model compared to a 
null model with no time dependence. We also removed genes that had no pol-II 
observations at 2 or more time points. This left 4420 genes for which we fitted 
the models. 


5.5 Modelling and parameter estimation 

We model the relationship between pol-II occupancy and mRNA concentration 
using the differential equation in Eqn. 0 which relates the pol-II time series 
p(t) and corresponding mRNA time series m(t) for each gene. We model p{t) 
in a nonparametric fashion by applying a Gaussian Process (GP) prior over the 
shapes of the functions. We sightly modify the model in Eqn. (|T|) by adding a 
constant (3 0 to account for the limited depth of pol-II ChIP-Seq measurements, 


d m(t) 
d t 


= Po + /3p(t - A) - um[t) 


( 2 ) 


This differential equation can be solved for m(t ) as a function of p(t) in closed 
form. The pol-II concentration function p(t) is represented as a sample from a 
GP prior which can be integrated out to compute the data likelihood. The model 
can be seen as an extension of a previous model applied to transcription factor 
target identification 11 . Unlike Ref. 11 , we model p(t) as a GP defined as an 


integral of a function having a GP prior with RBF covariance, which implies that 
p(t) tends to remain constant between observed data instead of reverting back 
to the mean. Additionally we introduce the delay between pol-II concentration 
and mRNA production as well as model the initial mRNA concentration as 
an independent parameter. In the special case where A = 0, and mo = Po/ol, 
Eqn. ( |S3| ) reduces to the previous model (Eqn. 4 in [Tl]). In order to fit the model 
to pol-II and mRNA time course data sampled at discrete times, we assume we 
observe m(t) and p(t) corrupted by zero-mean Gaussian noise independently 
sampled for each time point. We assume the pol-II noise variance is a constant 
<7p and infer it as a parameter of the model. The mRNA noise variances for 
each time point are sums of a shared constant cr^ and a fixed variance inferred 
by BitSeq by combining the technical quantification uncertainty from BitSeq 
expression estimation with an estimate of biological variance from the BitSeq 
differential expression model (full details in Supplementary Material). 

Given the differential equation parameters, GP inference yields a full pos¬ 
terior distribution over the shape of the Pol-II and mRNA functions p(t) and 
m(t). We infer the differential equation parameters from the data using MCMC 
sampling which allows us to assign a level of uncertainty to our parameter es¬ 
timates. To infer a full posterior over the differential equation parameters po, 
P, a , A, mo, E[po\ = p p , the observation model parameters a p , (J 2 rn , and a 
magnitude parameter C p and width parameter l of the GP prior, we set near¬ 
flat priors for them over reasonable value ranges, except for the delay A whose 
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prior is biased toward 0 (exact ranges and full details are presented in Sup¬ 
plementary Material). We combine these priors with the likelihood obtained 
from the GP model after marginalising out p(t) and m(t), which can be per¬ 
formed analytically. We infer the posterior over the parameters by Hamiltonian 
MCMC sampling. This full MCMC approach utilises gradients of the distri¬ 
butions for efficient sampling and rigorously takes uncertainty over differential 
equation parameters into account. Thus the final posterior accounts for both 
the uncertainty about differential equation parameters, and uncertainty over 
the underlying functions for each differential equation. We ran 4 parallel chains 
starting from different random initial states for convergence checking using the 
potential scale reduction factor of 21 . We obtained 500 samples from each of 


the 4 chains after discarding the first half of the samples as burn-in and thin¬ 
ning by a factor of 10. Posterior distributions over the functions p(t) and m{t) 
are obtained by sampling 500 realisations of p(t) and m(t) for each parameter 
sample from the exact Gaussian conditional posterior given the parameters in 
the sample. The resulting posteriors for p[t) and m(t) are non-Gaussian, and 
are summarised by posterior mean and posterior quantiles. Full details of the 
MCMC procedure are in Supplementary Material. 


5.6 Filtering of results 

Genes satisfying the following conditions were kept for full analysis. (Full im¬ 
plementation details of each step are in Supplementary Material.) 

1. p{t) has the maximal peak in the densely sampled region between 1 min 
and 160 min. 

2. Estimated posterior median delay is less than 120 min. 

3. p(t) does not change too much before t = 0 min to match the known start 
in steady state. 

5.7 Analysis of the gene annotation features associated 
with the delays 

Ensembl version 68 annotations were used to derive features of all genes. For 
each annotated transcript, we computed the total pre-mRNA length m as the 
distance from the start of the first exon to the end of the last exon, and the 
lengths of all the introns. Transcripts consisting only of a single exon (and hence 
no introns) were excluded from further analysis. For each gene, we identified 
the transcript with the longest pre-mRNA and used that as the representative 
transcript for that gene. The last intron share / was defined as the length of 
the last intron of the longest transcript divided by m. 


5.8 Pre-mRNA end accumulation index 

For this analysis, we only considered reads aligning uniquely to pre-mRNA 
transcripts and not to any mRNA transcripts. We counted the overlap of reads 
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with 200 bp bins starting from the beginning of the first exon of each gene 
ending with the last non-empty bin. We compute the fraction r e> j of all reads in 
the latter half of bins in each sample i, and define the index as the difference of 
the means of r ei over late time points (80-320 min) and over early time points 
(10-40 min). 
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Supplementary material 

In the following we provide details of data acquisition, processing of RNA-seq 
data and filtering of active genes, processing of pol-II ChIP-seq data, differential 
equation modelling of the connection between pol-II and mRNA, Gaussian pro¬ 
cess based inference of underlying time series, and summarisation and filtering 
of results. We then provide an explanation of how synthetic data were used to 
study accuracy of parameter estimation for mRNA half life, a measure of mRNA 
decay in the differential equation model between pol-II and mRNA. Lastly, we 
provide additional figures about tail probabilities of delays for alternative result 
filtering choices, an additional figure about long posterior mean delays with and 
without annotated exon skipping, and differences in pre-mRNA accumulation 
in short and long delay genes. 

A Data acquisition 

MCF-7 breast cancer cells were treated with estradiol (E2). The cells were put 
in estradiol free media for three days. This is defined media devoid of phenol 
red (which is estrogenic) containing 2% charcoal stripped foetal calf serum. The 
charcoal absorbs estradiol but not other essential serum components, such as 
growth factors. This resulted in basal levels of transcription from E2 dependent 
genes. The cells were then incubated with E2 containing media, which resulted 
in the stimulation of estrogen responsive genes. Measurements were taken at 
logarithmically spaced time points 0, 5, 10, 20, ..., 1280 minutes after E2 
stimulation. 

At each time point, the pol-II occupancy was measured genome-wide by 
ChIP-seq. Raw reads were mapped onto the human genome reference sequence 
(NCBI_build37) using the Genomatix Mining Station (software version 3.5.2). 
The mapping software is an index-based mapper using a shortest unique sub¬ 
word index generated from the reference to identify possible read positions. A 
subsequent alignment step is then used to get the highest-scoring match(es) ac¬ 
cording to the parameters used. We used a minimum alignment quality thresh¬ 
old of 92% for mapping, reads were not trimmed. On average 84% percent of 
reads could be mapped uniquely. 

At each time point, the pre-mRNA and mature mRNA abundances were 
measured for each human gene by RNA-seq. Total RNA was isolated and sub¬ 
jected to rRNA depletion with the Ribo-Zero Magnetic Gold Kit and processed 
further for strand-specific RNA-seq. The RNA-seq reads were mapped using 
Bowtie to a transcriptome constructed from Ensembl version 68 annotation al¬ 
lowing at most 3 mismatches and ignoring reads with more than 100 alignments. 
The transcriptome was formed by combining the cDNA and ncRNA transcrip- 
tomes with pre-mRNA sequences containing the full genomic sequence from the 
beginning of the first annotated exon to the end of the last annotated exon. On 
average 84.7% of the RNA-seq reads were mapped. 

All the ChIP-seq and RNA-seq data are available from the NCBI Gene 
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Expression Omnibus under accession number GSE62789. 


B RNA-seq data processing 


RNA-seq data were analysed at each time point separately using BitSeq (I2| . 
The reads were first mapped to human reference transcriptome (Ensembl v68) 
using Bowtie version 0.12.7 22 . In order to separate pre-mRNA activity as 


well, we augmented the reference transcriptome with pre-mRNA transcripts for 
each gene that consisted of the genomic sequence from the beginning of the first 
exon to the end of the last exon of the gene. 

BitSeq uses a probabilistic model to probabilistically assign multimapping 
reads to transcript isoforms 12 , in our case also including the pre-mRNA tran¬ 
scripts. We obtained gene expression estimates by adding the corresponding 
mRNA transcript expression levels. In addition to the mean expression lev¬ 
els, BitSeq provides variances of the transcript isoform expression levels. We 
further used the biological variance estimation procedure from BitSeq differen¬ 
tial expression analysis on the estimated gene expression levels by treating the 
first three time points (0, 5, 10 min) as biological replicates. Genes with sim¬ 
ilar mean expression levels (log-RPKM) were grouped together such that each 
group contained 500 genes except for the last group with 571 genes with the 
highest expression. Then, the biological variances were estimated for each group 


of genes by using the Metropolis-Hastings algorithm used in BitSeq stage 2 12 


Biological variances for the single measurements were determined according to 
the gene expression levels at each time point, where each gene was considered to 
belong to the closest gene group according to its expression level. The observa¬ 
tion noise variance for each observation was defined as the sum of the technical 
(BitSeq stage 1) and biological (BitSeq stage 2) variances, and transformed from 
log-expression to raw expression using 


= vf og exp(pti og ) 2 . 


(SI) 


Different time points of the RNA-seq time series were normalised using the 
method of (20 as implemented in the edgeR R/Bioconductor package [23]. 
Statistics of RNA-seq mapping and distribution of reads for pre-mRNA and 


mRNA transcripts are presented in Tables S2 and S3 as well as Fig. S9 


C Filtering of active genes 

We removed genes with no clear time-dependent activity by fitting time-dependent 
Gaussian process models to the activity curves and only keeping genes with 
Bayes factor at least 3 in favour of the time-dependent model compared to a 
null model with no time dependence. We also removed genes that had no pol-II 
observations at 2 or more time points. This left 4420 genes for which we fitted 
the models. 
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D Pol II ChIP-seq data processing 

The ChIP-seq data were processed into time series by summarising the pol-II 
occupancy over time for each human gene (Ensembl version 68 annotation was 
used for gene positions), by a series of steps as follows. 

1. Each gene was divided into 200 bp bins and levels of pol-II activity were 
computed at each time point as the total weighted count of reads overlap¬ 
ping each bin, where each read was weighted by how many basepairs in the 
read overlap the bin, as follows. Only uniquely mapped reads were used. 
For any read that at least partially overlaps a bin, the number of basepairs 
overlapping the bin was added into the activity level of the bin. For any 
read spanning multiple bins, the number of basepairs overlapping each bin 
were added into activity of that bin. The Genomatix mapping software 
provides alignment scores (values between 0 and 1; with our threshold 
only between 0.92 and 1) for mapping reads to the genome; for any read 
having alignment score less than 1, the number of overlapping basepairs 
added to each bin was multiplied by the alignment score. 

2. A noise removal was then done: a noise level was computed as the average 
activity level in 74 manually selected regions from Chromosome 1 that 
were visually determined to be inactive over the measurement time points, 
as follows. The regions were divided into 200 bp bins, and total weighted 
counts of reads overlapping each bin (each read weighted by the number 
of basepairs overlapping the bin) were computed in the same way as for 
the genes in the previous step. For each time point, the noise level was 
computed as the average activity level over all bins from all 74 regions. 
The computed noise level was subtracted from the mean of each bin in 
each gene, thresholding the result at zero. A list of the empty regions used 
is included as a Supplementary Dataset SI. 

3. As the number of ChIP-seq reads collected overall for pol-II varies between 
time points, a robust normalisation was done. After the previous noise 
removal step, for each gene g at each time t we compute the mean of 
the remaining activity (activity level after noise removal) over bins of the 
gene, denoted as r gt . The activity levels are weighted counts of basepairs 
from reads overlapping the gene; we select genes having sufficient activity, 
that is, at least 5 • 200 overlapping basepairs from reads over each 200 bp 
bin of the gene, on average over the bins. For each gene g let T g = {t' £ 
{5,10,..., 1280 min}|r gt / > 5 • 200} denote those time points (except the 
first time point) where the gene has sufficient activity. For each time point 
we compute a normalisation factor of [20 



where Median g {-} denotes median over genes and 
GeomMean t / {r g f } = (IIceT,, r gt') l ^ T ^ is the geometric mean over the 
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time points having sufficient activity for gene g. The median is com¬ 
puted for time points after the first time point; for the first time point 
t = 0 min we set C t = 1. The factor Ct normalises all the gene activity 
levels (weighted read counts) at a time point downwards if genes at that 
time point have unusually many reads, exceeding their (geometric) mean 
activity level, and normalises upwards if gene activity levels fall under 
their mean activity level. 

4. Lastly, time series summaries were computed for pol-II at each gene. For 
each gene at each time point f, the mean activity level (weighted read- 
count) of pol-II over bins in the 20% section of the gene nearest to tran¬ 
scription end was computed, normalised by C t . This measured pol-II level 
represents transcriptional activity that had successfully passed through 
the gene to the transcription end site; it is expected to correspond better 
with nrRNA production rate than pol-II activity at the transcription start 
of the gene, since pol-II near the transcription start site can be in the 
active or inactive state and after activation may require a significant time 
for transcription to complete. 

5. For a small number of genes where the active mRNA transcripts covered 
only part of the gene, we considered the area from the first active exon to 
last active exon, and summarised the gene using the 20% section nearest 
to the end of the area. Active transcripts were defined here as transcripts 
with a mean of more than 1.1 assigned counts in the BitSeq posterior 
expression estimates. BitSeq uses a prior that assigns 1 “pseudo-count” 
per transcript, so the active transcripts were only required to have minimal 
posterior expression that was distinguishable from the prior. A list of 
active transcripts is included as Supplementary Dataset S2. 

6. Lastly, for mathematical convenience, for each pol-II time series we sub¬ 
tracted from all time points the minimum value over the time points. 

E Differential equation based modelling 

We model the role of pol-II as a catalyst of the transcription of DNA into 
nrRNA as a differential equation for each gene; the differential equation relates 
the pol-II time series p{t) of the gene and the corresponding mRNA time series 
m(t). 

Let us assume the momentary pol-II activity directly represents the momen¬ 
tary rate of transcription, potentially with a delay, and that the nrRNA decays 
at a constant rate. We model this as a linear differential equation 

= A) + Pp{t - A) - am(t) (S2) 

where A is a delay parameter between the pol-II activity and the momentary 
transcription rate, /3 q is a parameter representing the baseline transcription rate 
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from unobservable pol-II background (baseline production level of mRNA), /3 
is a parameter representing transcriptional efficiency , that is, sensitivity of the 
transcription rate to activity of pol-II, and a is a constant mRNA decay rate 
parameter that is related to mRNA half-life t \/ 2 through a = ln(2)/t 1 / 2 - 

The momentary mRNA level m{t) can be solved from the differential equa¬ 
tion to yield the following solution: 

m{t) = moe^* 0 " 4 ) + — (l - e~ at ) + /3e~ at f e au p(u - A)d u (S3) 

^ J u —0 min 


where pol-II activity is assumed to start at t = 0 min (p(t) = 0 for t < 0 min), 
f 0 > 0 min is the time of the first observation, and mo is an initial mRNA 
abundance at to which is inferred as a parameter of the model. No parametric 
assumptions are made about the shape of the pol-II time series function p(t), 
and the only assumption about the mRNA level m(t) is that it arises through 
the differential equation. 

The linear differential equation (S21 and its linear solution operator (S31 
are similar to those used previously in j9j|ll] except for the added delay. As 
in the previous works, the linearity of the solution operator permits exact joint 
Gaussian process (GP) modelling over p{t) and m{t). 


F Gaussian process inference 

We model pol-II and mRNA time series, p(t) and m(t), in a nonparametric 
fashion which avoids the assumption of a specific parametric shape for the time 
series function; instead, we set a GP prior over the time series functions. 

For each gene, GP inference of the posterior distribution over the underlying 
pol-II and mRNA time series can be done in closed form given fixed values of the 
differential equation parameters. GP inference is based on mean and covariance 
functions. Below we describe the GP model of pol-II and mRNA, their respec¬ 
tive mean functions and covariance functions, and the cross-covariance function 
between pol-II and mRNA. GP inference of the posterior is then a standard 
inference equation which we provide for completeness. 

The above inference provides a posterior distribution over the profiles p(t) 
and m(t) given known values for the differential equation parameters. However, 
these values are not known and to infer a full posterior over both time series and 
these parameters we carry out Markov Chain Monte Carlo (MCMC) sampling 
over the parameter values, as described in Section “Parameter inference by 
Hamiltonian Monte Carlo sampling”. 


F.l GP model of pol-II 

For each gene, we model the pol-II activity time series in a nonparametric 
fashion by applying a GP prior over the shapes of the time series. Previous 
similar GP models [9-11 have used a squared exponential covariance function 
for p(t), as that allows derivation of all the shared covariances in closed form. 
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This covariance has the limitation that it is stationary, and functions following it 
revert to zero away from data. These properties severely degrade its performance 
on our highly unevenly sampled data. To avoid this, we model p(t) as an integral 
of a function having a GP prior with a squared exponential covariance: then the 
posterior mean of p(t) tends to remain constant between observed data. That 
is, we model 

p(t)=Po + f v(t)dt (S4) 

J u=0 

where po is the initial value at time t = 0 min, and assign a GP prior with 
the squared exponential covariance over v(t); as a result p(t) will also have a 
GP prior whose covariance function is an integral of the covariance function 
of v(t). For mathematical convenience we assume p(t) = 0 min for t < 0 min, 
and set the initial observation time to to a sufficiently large value to avoid any 
discontinuity resulting from assumption in pol-II or mRNA modeling. 

To define the GP prior, we first define the mean function of p(t). Assume that 
v(t) is drawn from a zero-mean GP prior with a squared exponential covariance 
function k v (t,t') = C p ■ exp(—(t — t') 2 /l 2 ) where C p is a magnitude parameter 
and l is a length scale, which has been parametrised in a non-standard manner 
to simplify the derivations. Then E[p(t)] = E\po] + f* =to E[v(t)]dt = E[p 0 \ = p p 
for t > 0 min. 

Next we compute the corresponding covariance function for the GP prior of 
p(t). We have 


MM') = E[{p{t) - p P ){p(t') - p p )\ 




s^dsds' 


J s Q “ s )/0 “ erf(-s/0 


ds 


(S5) 


The remaining integral over the erf functions can be computed using inte¬ 
gration by parts. After straightforward manipulation, the integral becomes 


k P (t, t') = Cp ^ 1 (tie rf (ti) + t[e rf (t[) - {t[ - t t )e rf {t[ - t t ) ^ 

+ ( exp + exp ( _ ^) 2 ) ~ exp _ ^) 2 ) - • ( S6 ) 

where we denoted ti =t/l and t\ = t'/l for brevity. 

The right-hand side is the covariance function k p (t,t r ) of the 
integrated squared-exponential GP prior for pol-II. 


F.2 GP model of mRNA 

We model the mRNA abundance in a similar nonparametric fashion as the pol- 
II activity. Since the mRNA is related to pol-II through a differential equation, 
the GP prior of mRNA can be computed from the GP prior of pol-II through the 
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differential equation. In particular, as shown in Eq. (S3), the rnRNA time series 
is an integral of the pol-II time series. Since integration is a linear operation, 
the expected mRNA time series is an integral of the expected pol-II time series; 
that is, the GP mean function of mRNA is an integral of the mean function of 
pol-II, so that 


Hm{t) = E[m{t)\ = m 0 e“ (t ° + 

— (1 - e~ at ) + /3e~ at [ e au E[p(u - A)]du 
« Ju =o 

= m 0 e-“ (t - to) + — (1 - e~ at ) + (3e~ at [ e au p p du 
a Ju =A 

= m 0 e- a{t - to) + — (1 - e~ at ) + ^(1 - e -“ ( *" A) ) (S7) 

a a 

where the third line follows since pol-II activity starts at t = 0 min. Note that 
the start of pol-II activity at t = 0 min is for mathematical convenience, and 
the initial observation time to will be set to a sufficiently large value so that the 
t — A > 0 min for all t > to and hence observed mRNA values are integrated 
over active pol-II only regardless of delay A. 

We next compute the corresponding covariance function for the GP prior 
of mRNA. The covariance function arises from computing the integral relating 
mRNA to pol-II as follows: 


k m {t,t') = E[(m(t) - - Mm(*'))] 

= E (/3e- at J e au p(u-A)du- 

^/3e~ at ' [ e au 'p{u' - A)d u' - ^(1 - 
V Ju '=0 ^ /_ 

(e~at f e au p ( u _ A ) du _ /^(l _ e -a(t-A)A 

A Ju =A a J 

at ' /* e au 'p{u' - A)du' - ^(1 - e -“ (t '- A) ) 
J u'—A & 


= /3 E 


(S8) 


where the last equality follows since pol-II activity starts at time 0 min. The 
computation of the integrals follows similar steps as computation of the pol-II 
GP covariance. The result is 

k m (t,t') = km t i(t,t') + k m>2 (t,t') + km j3 (t,t') + k m>i (t,tf) (S9) 
where we divided the covariance function into four parts. The first part is 




2a 2 


t _ 1 + exp t A 

a a 


evll- 


+ (4 - i + ?AA 1 ) er [ (t) - ‘ 4)erf( A^)) ( S “) 
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where t A — max(0 min, t — A). The second part is 


, . ,, l 2 Cp/3 2 ( ( 

k m , 2 {t, t ) = 2 ^ 2 ( exp ( 

The third part is 

/ a -3 

k mi 3 (t, t') = -^/nlCpP 2 ( —— exp(a 2 / 2 /4 + a(t A - t' A )) 

(erf(a//2 + t A /l) - erf(aZ/2 + (tA - t' A )/l )) 

3 

+ exp(a 2 / 2 /4 — — at A ){eri{al/2) — erf(aZ/2 — t' A /l)) 

a -3 \ 

-— exp(a 2 t 2 /4 — at' A ) (erf(aZ/2) — erf(aZ/2 — t' A /l)) J . (S12) 

The fourth part is 

(or 3 

k m , 4 (t, t’) = -s/irlCpP 2 f —— exp(a 2 / 2 /4 - a(t A - t' A )) 

(eri(al/2 + t' A /l ) — erf(a//2 — (t A — t' A )/l )) 

a ~3 

H—— exp(a 2 / 2 /4 — at A — at' A ) (erf(aZ/2) — erf(aZ/2 — t A /l)) 

a -3 \ 

-— exp(a 2 t 2 /4 — atA)(erf(c>!Z/2) — erf(aZ/2 — t A /l)) J . (S13) 

F.3 GP joint model 

To define the full GP prior over both pol-II and mRNA, it remains to define the 
cross-covariance function between pol-II and mRNA. The full GP covariance 
is defined by the individual covariances of pol-II and mRNA and the cross¬ 
covariance. 

The cross-covariance function between (noiseless) mRNA abundance in(t) 
at time t and (noiseless) pol-II activity p{t') at time if is computed with similar 
steps as the computation of the mRNA covariance function. The result is 

k m p(t,t') = E[(m(t) - Hm{t)){p{t') - li p (t'))\ 

= k mp> i(t, t') + k mp ,2(t, it) + kmp,3(t, t') (S14) 

where for convenience we separated the kernel function into a sum of three 
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components. The first component part of the kernel can be written as 


1 t ) — 


2a 2 eXP (UJ -° <i + ai 

al t'\ ,f al t' — t A 

erf |_ + _ _ 


VnP 2 ^/Cpl f f al 

2a 2 ° P T 1 - 0<i 


. Oil ^ A \ Oil 

erf Y~ T) _erf ( 2" 


The second component can be written as 


fcmp, 2 (^7 ^ ) — 


& 1 \fcpP 

2a 


exp - 


t A -t' 

l 


t A 

- exp | - ( — 


+ 1 - exp - - 


2\ 1 


The third component can be written as 


kmp, 3(^5 f ) — 


2a 


(t A - t' - l/a)erf {{t A - t')/l) 


— (t A — l/a)evf(t A /l) — ( t' + exp(— at A )/a)eri(t'/l) 


(S15) 


(S16) 


(S17) 


F.4 Observation model 

In order to fit the models of the pol-II and mRNA functions to observations, 
we need an observation model. It is assumed that we observe noisy values 
m(t) = m(t) + e m (t) and p(t) = p(t ) + e p (t) where e m (t) and e p (t) are zero- 
mean Gaussian noise independently sampled for each time point. For simplicity 
we assume the noise variance of e p (t) is a constant a p and infer it as a parameter 
of the model. We estimate the mRNA noise variances <J 2 m {t) for each time point 
t as sums of a shared constant cr^ and a fixed variance inferred by BitSeq 
by combining the technical quantification uncertainty from BitSeq expression 
estimation with an estimate of biological variance from the BitSeq differential 
expression model (full details are in Sec. RNA-seq data processing). 

Since the noise is zero-mean, the GP prior for the noisy observations has the 
same means as the noiseless means, that is, E[m(t)\ = E[m(t)\ and E\p{t)] = 
E[p(t)\. Since the noise is independently added to each observation, the covari¬ 
ance function of observed pol-II becomes 

k p (t, t') = k p (t , t') + S(t, t')a p (S18) 
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where 5(t, t') = 1 if t = t' and zero otherwise, the covariance function of observed 
rnRNA becomes 

= k m (t,t') + S(t,lf)a^ n {t) , (S19) 

and the cross-covariance function between observed pol-II and mRNA is the 
same as the noiseless version so that 

kmp{t,t') = k m p (t, t') . (S20) 

The GP prior over time series functions and the observation model together 
define a full probability model for the pol-II and mRNA data. As the observa¬ 
tions are noisy and available only at a small set of time points, we will apply 
Bayesian inference to infer the underlying time series m{t) and p(t) from the 
observations. 


F.5 Covariance matrix for GP inference 


Given a set of time points, here the 10 time points 


T obs =t 0 + (0,5,10,20,40,80,160,320,640,1280) 
= (*!)■■ • >tjv) 


where to is the initial observation time and the numbers denote time in minutes, 
and the corresponding observation data consisting of N = 10 pol-II observations 
and N = 10 mRNA observations V = (p(ti),... ,p(t]y), m(ii), .. ., m(tjv)), we 
wish to compute the posterior distribution of GP hyperparameters, and to pre¬ 
dict the shape of the underlying time series functions p(t) and m(t) given the 
posterior. We will especially wish to study delay between pol-II and mRNA; 
for mathematical convenience we set to = 300 min and consider mRNA delay 
parameters 0 < A < 300 min. 

For GP inference, given the hyperparameters we must compute the prior 
GP covariance matrix for the observations D. We describe the matrix here 
in a general form which is needed later for inference of time series values at 
previously unseen time points. 

The covariance matrix describes covariance between measurements at one set 
of time points (indexed by rows of the matrix) and another set of time points (in¬ 
dexed by columns of the matrix). Let T row = (t roWt i,..., t roWi N rom ) be a vector 
of N row time indices for rows of the matrix, and let T co ; = (t co i, i,... ,t C o;,Ar coI ) 
be the vector of N row time indices for columns of the matrix. 

The resulting covariance matrix K(T row ,T co i) has the block structure 


K(T row ,T co i) = 


K* K, 


K, 


mp 


pm 

Ka 


(S21) 


where each block is a N row x N co i matrix of covariance function values between 
the time points t € T row and the time points t' e T co i, so that Kp is composed 
of values kp(t,t'), Ka is composed of values km{t,t'), K rh p is composed of the 
cross-covariance values k r np(t, t'), and KpA is composed of the cross-covariance 
values kf n p (t 1 . t). The covariance matrix of observed data is then simply I\ 0 bs = 
K{T obs , T obs )- 
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F.6 Marginal likelihood function 

The analytical tractability of the GP model allows us to marginalise out the la¬ 
tent functions p(t) and m(t) to compute a marginal likelihood that only depends 
on the model parameters. The marginal probability density of the observations 
V is Gaussian and the marginal log-likelihood is 

logP(X>) = (l/2)(-dlog(27r) - log(|A" obs |) - u T K~^ s u) (S22) 

where P denotes the marginal probability density, d = 20 is the total num¬ 
ber of pol-II and rnRNA observations and u is the column vector of observa¬ 
tions with their expected values subtracted, u = [p(t±) — E[p(t\)\,... ,p(tN ) — 

E[p(t N )\,m(ti) - E[m(ti)],.. .,m(t N ) - E[rh(t N )]\ . 

F.7 Posterior prediction 

The analytical tractability of the GP model also allows us to obtain the full 
posterior distribution over the latent functions in closed form given the param¬ 
eters. Given the observed data, we can thus compute the mean and covariance 
of the underlying time series function values at each time point, as expectations 
over the posterior distribution of the underlying functions. For N* new time 
points T* = (ti,, t %,) the posterior mean is 

E[Wi), ■ ■ ■ ■ ■ ■, rh(tN-)m = Krior + K{T*,T obs )K~ b \u (S23) 

where 

u; iior = [E{p(t *)},.. .,E[p(t* N „)},E[m(tl)l.. • ,E[fh(t* N ,)]] r (S24) 

is the vector of prior means computed at the new time points, and the posterior 
covariance matrix is 


Cov\(p(t\),... ,p(t* N ,),rh(tl),m{t* N ,))\V] 

= K(T* ,T*) - K{T*, T obs )K~ h \K{T*, T obs f . (S25) 

The log-likelihood and predictions of function values described here are com¬ 
puted given fixed values of hyperparameters of the GP prior and the observation 
model. We will compute a posterior distribution for the hyperparameters, given 
suitable prior distributions for each. This will allow summarisation of underlying 
pol-II and mRNA functions and GP parameters over the posterior distribution 
of the hyperparameters. We next describe the prior distributions of hyperpa- 
rameters and then describe the sampling based inference of hyperparameter 
posterior distributions. 

F.8 Parameter prior distributions 

All parameters except the delay A have approximately uniform bounded logistic- 
normal priors. These priors were used because of convenience: they allow easy 
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Hamiltonian Monte Carlo sampling that requires very little tuning (see below 
for details). 

The density of the logistic-normal prior logit-normal(^t, cr 2 , a, b) with location 
parameter /. l and scale parameter cr 2 for variable 6 bounded to the interval [a, b } 
is 


p(%,cr 2 ,a,&) = 


1 

\f 7 2/KCJ 2 


exp 


(logit((6> - a)/(b - a)) - /r) 2 \ 
2cr 2 ) 


b — a 

(6-a)(b-0)’ 


(S26) 


where logit(x) = log(x/(l — x)). We use fj, = 0 and cr = 2 which lead to an 
approximately uniform distribution on the interval [a,b\. The interval bounds 
a, b for all variables are presented in Table [Si] 

For the delay A we use a prior with fi = —2, cr = 2 to reflect our prior belief 
that the delays should in most cases be small. For j3 and l we set the priors 
with respect to ft 2 and 2 /l 2 respectively, because these are more convenient as 
model parameters. 


Parameter 

Lower bound a 

Upper bound b 

2 /l 2 

(1280 min) -2 

(5 min) -2 


ss 6.1 ■ 10 -7 min -2 

= 4 ■ 10 -2 min -2 

c p 

2-10- 4 cr 2 Pol2 

2 

a PoL2 


0-05d| > oJ2 

~ 2 

a Pol2 

a 

1 ■ 10 6 min 1 

log(2) min -1 ss 0.69 min -1 

p 2 

1 • 10 -6 min -2 

1 min -2 

A 

0 min 

299 min 

A> 

0 min - 

1 min -1 

mo 

0 

2 

/G 

0 

1 


Table SI: Bounds for bounded logistic-normal priors of differential equation 
parameters in the GP inference of pol-II and mRNA time series. Each parameter 
is bounded to an interval [a, b ], we list the values of the lower bound a and 
upper bound b. Here <7p o;2 is the empirical variance of the pol-II time series 
after preprocessing. 


F.9 Parameter inference by Hamiltonian Monte Carlo sam¬ 
pling 

Given the data and the priors for the parameters, we apply fully Bayesian in¬ 
ference with Hamiltonian Monte Carlo (HMC) sampling 15] to obtain samples 
from the posterior distribution of the parameters. HMC is a MCMC algorithm 
that uses gradients of the target distribution to simulate a Hamiltonian dynami¬ 
cal system with an energy function based on the target distribution. This allows 
taking long steps while maintaining a high acceptance rate in the sampling. 
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In order to apply HMC more easily, we transform all parameters to an un¬ 
bounded space using the logistic transformation. The logistic-normal priors cor¬ 
respond to normal priors on the transformed variables, which effectively prevent 
the sampler from wandering off to the saturated region of the transformation 
near the bounds of the intervals. 

We run 4 parallel chains starting from different random initial states to allow 
convergence checking. We use the HMC implementation from NETLAB toolkit 
in Matlab with momentum persistence and number of leap frog steps r = 20 
which were found to work well in all cases. The step length e is tuned separately 
for every model (see below). After tuning, each chain is run for 10000 iterations. 
The samples are then thinned by a factor of 10, and the first half of the samples 
are discarded, leaving 500 samples from each chain, 2000 in all. Convergence is 
monitored using the potential scale reduction factor 21 . is computed 
separately for each variable, and if any of them is greater than 1.2, the result 
is discarded and a new sample obtained in a similar manner. The 9 genes that 
did not converge after 10 iterations of this process were removed from further 
analysis. In most cases these had severely multimodal delay distributions that 
were difficult to sample from and would have made further analysis difficult. 


F.9.1 Tuning 

The applied logistic transformation and priors together allow using the same 
global step length e for all variables, or using the identity matrix as the mass ma¬ 
trix in the HMC formulation. The step length e was determined by trying differ¬ 
ent alternatives in the set {10~ 5 ,10 -4 ,10 -3 ,0.003,0.005,0.01, 0.03, 0.05,0.07,0.1, 
0.3,0.5,1} in increasing order, running the sampler for 100 steps and using the 
largest value with at least 80% acceptance rate. This target rate is higher than 
usual in random walk MCMC because HMC acceptance rate should be nearly 
100% even with very long steps if the Hamiltonian system is simulated accu¬ 
rately. 


F.9.2 Summarisation of inference results 

The inference results are summarised using the median of the posterior samples. 
This is a convenient statistic because it is invariant to transformations of the 
parameter space, such as those used during the sampling. 


F.10 Validation of the GP modelling results 

In order to validate the GP model, we implemented inference for the same ODE 
using a smoothing spline fit for pol-II. A comparison of the results for the subset 
of genes that yielded reliable results with the spline approach is presented in 
Fig.ISlOl 
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G Filtering of results 

Reliable posterior samples were obtained for models of 4373 genes. 4304 of 
these had multiple-exon transcripts, and could thus be used for intron analyses. 
These genes were further filtered to remove bad fits by only keeping genes that 
satisfy the following: 

1. The global maximum f max of p(t) posterior mean p(t) in the interval t € 
[0 min, 1280 min] occurs in the interval t m ax € (1 min, 160 min). This 
condition ensures the profile has a peak in the densely sampled region 
which is necessary for accurate estimation of the delay. 

2. The posterior median delay A < 120 min. Because of the increasingly 
sparse sampling, longer delay estimates were considered unreliable. The 
specific cut-off was determined by visual inspection of the fits to rule out 
implausible ones. 

3. The posterior mean p(t) of p(t) does not change too much just before 
t = 0 min. This condition is necessary to avoid cases where a long delay 
pushes distinctive features of m{t) to p(t),t < 0 min, which conflicts with 
the assumption that the system is at a steady state until t = 0 min. 
Quantitatively, we define an index 

D = D- — D + = D[_ 30 m in,0 min] — D[ 0 m ; n ,io min] (S27) 


where 


D / 


^max[p(f)] — min[p(i)] 


/ r max \p{t)], 

££[—30 min,1280 min] 


(S28) 


and only include genes with 


D < 0.05. 


(S29) 


Intuitively, Dj looks at the magnitude of change in p(t) in the interval I 
relative to the global magnitude of change in p[t). The final statistic D 
looks for genes that have small changes in [—30 min, 0 min], but forgives 
genes with early large changes in [0 min, 10 min] because these would often 
spill over to t < 0 min because of the properties of the GP model. The 
cut-off 0.05 represents 5% change in magnitude, which seems reasonably 
small. The main conclusions of the work are robust to different cutoffs, as 
demonstrated in Fig. [S5| below. 


After these filtering steps, there were 1814 genes left for the analysis. 

Main results under an additional filter of setting a maximum for posterior 
inter-quartile range are presented in Fig. |S11[ 
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Figure SI: Boxplots of parameter posterior distributions illustrating parame¬ 
ter estimation performance on synthetic data for the mRNA half life ti/ 2 = 
log(2)/a. The strong black lines indicate the ground truth used in data genera¬ 
tion. The box extends from 25th to 75th percentile of the posterior distribution 
while the whiskers extend from 9th to 91st percentile. The model often under¬ 
estimates the half lives, especially in the presence of a significant delay. 


H Synthetic data generation 


The synthetic data were generated by fitting a GP with the MLP covariance 24 
to the Pol-II measurements of the gene TIPARP (ENSG00000163659), and nu¬ 


merically solving the mRNA level using Eq. (S3) with the GP posterior mean 
as p(t). The parameters used were: A £ {0,10,20,30} min, = log(2)/a £ 
{2,4,8,16,32,64} min, /3 0 = 0.005, /3 = 0.03, mo = 0.008/a. The parameter 
values were chosen empirically to get profiles that approximately fitted the ac¬ 
tual mRNA observations while looking reasonable and informative across the 
entire range of parameter values. 


I Supplementary Results 


In this section we provide supplementary Figs. Si] S8 discussed in the 
paper as well as Figs. S9 |Sll|discussed in the Supplementary Methods. 


main 


I.l Estimation of delays under changing mRNA half life 
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Figure S2: Boxplots of parameter posterior distributions illustrating parameter 
estimation performance on synthetic data for the delay parameter A. The 
strong black lines indicate the ground truth used in data generation. The box 
extends from 25th to 75th percentile of the posterior distribution while the 
whiskers extend from 9th to 91st percentile. This is a counterpart of Fig. [3] in 
a situation where the simulated mRNA half-life t\i 2 changes during the time 
course, something our model cannot capture. The simulated changes are point 
changes up or down with a factor of 1.5 or 2 at 80 min. The results show 
that delay estimates remain accurate and reliable, with the true value always in 
the high posterior density region, and demonstrate the conservativeness of the 
estimates with no sign of serious overestimation of small delays. 
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Figure S3: An illustration of how proper summarisation of the RNA-seq data 
is important for ruling out confounding effects from changing splicing patterns 
during the experiment. Left: Gene expression time series of the gene OSGIN1 
from RNA-seq data using either transcript-based RNA-seq data summarisation 
as in the paper or using a simpler summarisation of counts of reads aligned 
uniquely to the mRNA transcripts of OSGIN1. Centre: Proportion of tran¬ 
script ENST00000563543 out of all OSGIN1 transcripts. At 567 bp, this mRNA 
transcript is much shorter than the other major transcripts whose mRNAs are 
around 2 kb. Right: The model fit for OSGIN1 shows no evidence of significant 
delay, while the count-based profile in the left figure would suggest a longer 
delay. 
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Figure S4: Alternative versions of Fig. [H] of the main paper: tail probabilities for 
delays for different cut-offs for D in Eq. ( |S29[ ). Top: D < 0.05 (value used for 
main results), middle: D < 0.1, bottom: D < 0.01. Left: genes whose longest 
pre-mRNA transcript is short (m is the length from transcription start to end). 
Right: genes with relatively long last introns (/ is the ratio of the length of 
the last intron of the longest annotated transcript of the gene divided by the 
length of that transcript pre-mRNA). The fraction of genes with long delays A 
is shown by the red and blue lines (left axis). In both subplots, the black curve 
denotes the p-values of Fisher’s exact test conducted separately at each point 
(right axis) with the dashed line denoting p < 0.05 significance threshold. The 
general shapes of the curves are the same in every case. 
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Figure S5: Alternative versions of Fig. [ 5 ] of the main paper: different cut-offs 
for / and to. Left: genes whose longest pre-mRNA transcript is short (to is the 
length from transcription start to end). Right: genes with relatively long last 
introns (/ is the ratio of the length of the last intron of the longest annotated 
transcript of the gene divided by the length of that transcript pre-mRNA). 
The fraction of genes with long delays A is shown by the red and blue lines 
(left axis). In both subplots, the black curve denotes the p -values of Fisher’s 
exact test conducted separately at each point (right axis) with the dashed line 
denoting p < 0.05 significance threshold. The general shapes of the curves are 
the same in every case. 
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Figure S6: Alternative versions of Fig. [ 5 ] (right) of the main paper: explore 
the dependence on / for genes with a lower bound on mRNA length m. The 
fraction of genes with long delays A is shown by the red and blue lines (left 
axis). The black curve denotes the p-values of Fisher’s exact test conducted 
separately at each point (right axis) with the dashed line denoting p < 0.05 
significance threshold. The general shapes of the curves are the same in every 
case. 



Figure S7: Proportion of long posterior median delays for genes with/without 
annotated exon skipping. The fraction of genes with long delays A is shown by 
the red (no skipped exons) and blue (skipped exons) lines (left axis). The black 
curve denotes the p -values of Fisher’s exact test conducted separately at each 
point (right axis) with the dashed line denoting p < 0.05 significance threshold. 
There is no clear difference between the two groups. 
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Figure S8: Alternative version of Fig. [6] (right) of the main paper: computing 
the index based on pol-II ChIP-seq and GRO-seq instead of intronic RNA-seq 
reads. The top plots show differences in the mean pol-II accumulation index in 
long delay genes (blue) and short delay genes (red) as a function of the cut-off 
used to distinguish the two groups (left axis). Positive values indicate increased 
pol-II accumulation at the 3’ end (top left: last 50% of the gene body, top 
right: last 5% of the gene body) over time. The black line shows the p-values of 
Wilcoxon’s rank sum test between the two groups at each cut-off (right axis). 
The bottom plot is the same as top right, except for GRO-seq data of 25 , with 


the index is defined as the difference between the only late (160 min) time point 
and the average of the early (0-40 min) time points. In contrast to the pre- 
mRNA figure in the main paper, both long and short delay genes show a clear 
tendency towards accumulation of pol-II towards the end of the gene, but there 
is no clear difference between the two groups for the last 50% (top left), while 
there is a very consistent pattern of more pol-II accumulation very close to 3’ 
end (top right) for long delay genes, and the level is essentially independent of 
the estimated delay. GRO-seq data in the last 5% (bottom) behave similarly as 
pol-II ChIP-seq (top right). 
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Figure S9: Distributions of per-gene rnRNA and pre-mRNA counts and cover¬ 
ages based on BitSeq expression estimates. Top two rows show broad distribu¬ 
tions for all genes, while the bottom two rows show distributions biased toward 
higher values for the selected 1786 genes. The results show that the mRNA 
coverages are mostly clearly higher than for pre-mRNA, again demonstrating a 
sensible split between pre-mRNA and rnRNA. 
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Figure S10: Comparison of estimated posterior median delays from the GP 
model with an alternative spline-based model for n = 15 genes with reliable es¬ 
timates. In this model we used cubic smoothing splines to fit continuous curves 
to pol-II measurements. To account for the uneven sampling the times were 
transformed as t' = log(f/min + 5). The regularisation strength was shared 
over all genes and optimised by leave-one-out cross validation over all internal 
time points. The time transformation was also found to work much better than 
untransformed time in the cross validation. The smoothed pol-II curves were 
used as input to Eq. (S2) which was solved numerically to obtain predictions 
for m(t). Assigning a Gaussian noise model to m(t) similar to the GP model 
and using similar priors for all shared parameters, we run MCMC to obtain 
posteriors over the parameters. We were only able to obtain reliable parameter 
estimates for a small subset of genes for which the method had a good fit (mea¬ 
sured through expected relative residual variance) for both pol-II and mRNA. 
The other estimates were unreliable presumably because the method estimated 
the pol-II profiles independently but then ignored the uncertainty related to this 
estimation, which further highlights the benefits of the GP approach. 
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Figure SI 1: Alternative versions of Figs. [H] and [founder more stringent filtering 
of delay posterior interquartile range (IQR). 
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